This chapter concludes this thesis by summarizing its content and answering the research questions.
Section \ref{sec:conclusions-contributions} describes the contributions of this thesis.
Finally, Section \ref{sec:conclusions-future} motivates several interesting directions for future research.

The platform selected in this thesis is the Parrot AR.Drone quadrotor helicopter.
A simulation model of the AR.Drone is proposed, which allows safe and efficient development of algorithms.
This simulation model consists of a motion model, visual model and sensor model.
The validation effort of the hovering and the forward movement shows that the dynamic behavior of the simulated AR.Drone closely resembles the dynamic behavior of the real AR.Drone.

A framework is proposed to perform high-level tasks with the AR.Drone.
This framework includes an abstraction layer to abstract from the actual device, which allows to use both the real and simulated AR.Drone to validate the proposed methods.
The framework stores video frames and sensor measurements from the AR.Drone in queues, such that algorithms can operate on the data.
Furthermore, the framework includes a 3D map visualization and provides functionalities to record and playback data.

This thesis proposed a real-time Simultaneous Localization and Mapping approach that can be used for micro aerial vehicles with a low-resolution down-pointing camera (e.g., AR.Drone).
The information from the sensors are used in an Extended Kalman Filter (EKF) to estimate the AR.Drone's pose.
The position of the AR.Drone cannot be estimated directly and is derived from the velocity estimates from the AR.Drone.
This estimate is based on the inertia measurements, aerodynamic model and visual odometry obtained from the relative motion between camera frames.
When an estimate of the AR.Drone's pose is available, a map can be constructed to make the AR.Drone localize itself and reduce the error of the estimated pose.
The map consists of a texture map and a feature map.
The texture map is used for human navigation and the feature map is used by the AR.Drone to localize itself.

The texture map is constructed by warping a camera frame on a flat canvas.
A homography matrix describes how each frame has to be transformed (warped) in order to generate a seamless texture map without perspective distortion.
For each frame, the estimated pose and camera matrix are used to determine the area of the world which is observed by the camera.
This area is described by four 2D coordinates (a flat world is assumed).
Now, the homography is estimated from the four 2D world positions and their corresponding frame's pixel coordinates.

The feature map consists of a grid with SURF features.
For each camera frame, SURF features are extracted.
The estimated pose and camera matrix are used to compute the 2D world position of each feature.
Based on the 2D world position, a feature's cell inside the grid is computed.
Only the best feature of a cell is stored.

This feature map is used by the AR.Drone to localize itself.
For each camera frame, SURF features are extracted.
Similar to the mapping approach, the corresponding 2D world position of each feature is computed.
Each feature is matched against the most similar feature from the feature map.
From the feature pairs, a robust transformation between the 2D world coordinates is estimated.

An approach is proposed to compute the transformation between a frame and a map.
The method is very similar to RANSAC, but uses covariance instead of the number inliers as quality measure.
Three random feature pairs are selected.
From these pairs, the 2D world coordinates are used to compute the mean translation.
The covariance describes the quality of a translation.
These steps are repeated multiple times and the best translation is selected.
If a translation is very good (i.e., low covariance), Kabsch algorithm is used to refine the AR.Drone's estimated rotation.
Experiments show that the proposed method significantly outperforms OpenCV's implementation to estimate a perspective, affine or Euclidean transformation.
By reducing the degrees of freedom and using a beter quality measure, the robustness against noise increases.

A slightly modified version of the localization approach is used to perform visual odometry (estimate the velocity from camera images).
Instead of matching the last frame against the feature map, now the last frame is matched against the previous frame.
This estimated velocity can be used instead 
% or in combination with
 the AR.Drone's onboard velocity estimate.
Experiments show that the accuracy of the proposed visual odometry method is comparable to the AR.Drone's onboard velocity estimate.

Others experiments show the AR.Drone down-pointing camera offers good balance between image quality and computational performance.
When the resolution of the AR.Drone's camera is reduced, the accuracy of the estimated position decreases.
The reduced image quality results in less frames that could be used to robustly recover the velocity.
A simulated AR.Drone was used to investigate an increased camera resolution.
An increased image resolution does not necessarily produce more accurately estimated positions.
While relatively more frames could be used to used to robustly recover the velocity, the absolute number of processed frames decreased due to the increased computational complexity.
Therefore, the velocities are estimated less frequent.

Another experiment shows that localization is possible for circumstances encountered during the IMAV competition.
When flying the figure-8 shape three times, the maximum position error was $1\small{m}$.
%During a flight of $x\small{s}$, the visual odometry was recovered $x$ times, and localization against a map was performed $x$ times.

In addition to the texture and feature map, this thesis proposed a method to construct an elevation map using a single airborne ultrasound sensor.
Elevations are detected through sudden changes in ultrasound measurements.
These sudden changes are detected when the filtered second order derivative exceeds a certain threshold.
Experiments show that the relative error when flying over small objects is below $20\%$.
When flying over a large staircase, the relative error of the estimated altitude was $35\%$.









\section{Contributions}
\label{sec:conclusions-contributions}

The first contribution of this thesis is a framework that aids in the development of (intelligent) applications for the AR.Drone.
The framework contains an abstraction layer to abstract from the actual device.
This enables to use a simulated AR.Drone in a way similar to the real AR.Drone.
Therefore, a simulation model of the AR.Drone is developed in USARSim.
The main contribution is a novel approach that enables a MAV with a low-resolution down-looking camera to navigate in circumstances encountered during the IMAV competition.
An algorithm is presented to estimate the transformation between a camera frame and a map.
In addition to the navigation capabilities, the approach generates a texture map, which can be used for human navigation.
Another contribution is a method to construct an elevation with a single airborne ultrasound sensor.

Parts of this thesis have been published in:

\vspace{\baselineskip}

N. Dijkshoorn and A. Visser, "Integrating Sensor and Motion Models to Localize an Autonomous AR. Drone", \textit{International Journal of Micro Air Vehicles}, volume 3, pp. 183-200, 2011

\vspace{\baselineskip}

A. Visser, N. Dijkshoorn, M. van der Veen, R. Jurriaans, "Closing the gap between simulation and reality in the sensor and motion models of an autonomous AR.Drone", \textit{Proceedings of the International Micro Air Vehicle Conference and Flight Competition (IMAV11)}, 2011

\vspace{\baselineskip}

N. Dijkshoorn and A. Visser, "An elevation map from a micro aerial vehicle for Urban Search and Rescue - RoboCup Rescue Simulation League", \textit{to be published on the Proceedings CD of the 16th RoboCup Symposium, Mexico}, June 2012

\section{Future research}
\label{sec:conclusions-future}
%This thesis provides a broad range of directions for future research.

The proposed simulation model of the AR.Drone is validated in terms of hovering and forward movements. 
Further improvements would require more system identifications (e.g., rotational movements, wind-conditions) and include the characteristics of the Parrot's proprietary controller into the simulation model. 
The current sensor model uses a set of default USARSim sensors, which do not model noise accurately.
This research would benefit from a realistic modeling of sensor noise.

The proposed SLAM method is able to construct a map and localize the AR.Drone in real-time.
However, the proposed method lacks a global map optimization method, which reduces the error in the map when a loop-closure is detected.
An interesting research direction is to develop a global map optimization that is able to optimize both the feature map and texture map in real-time.

Experiments have shown that the proposed visual odometry method is unable to accurately estimate the velocity when insufficient texture is visible on the floor.
In these circumstances, the AR.Drone's onboard optical-flow based visual odometry method is able to estimate the velocity more accurately.
However, salient lines on the floor cause the AR.Drone's optical-flow method to estimate the velocity in the wrong direction.
This problem can be solved by integrating additional information (e.g., aerodynamic model) into the optical-flow method, to correct the direction of the estimated velocity.
When such a method is incorporated in the proposed SLAM method, it would yield an increased accuracy of the estimated velocities in low-texture conditions.

The elevation mapping experiment has shown that elevation mapping with a single ultrasound sensor is possible.
However, the proposed method is sensitive to errors that accumulate.
When an object is detected, the elevation is increased.
When an object is out of range, the elevation is decreased.
When both events do not observe an equal elevation, an incorrect elevation propagates into the elevation map.
A possible solution could be a global optimization method, which is able to correct incorrect elevations.
For example, large areas for which the elevation does not change can be assumed to be the floor.
Furthermore, the current approach is unable to estimate the dimensions of an object in all directions.
Visual clues could be used to estimate the dimensions of an object.
For example, image segmentation can be used to determine the boundaries of an object.
Another possible solution is to create a depth map from optical flow.
Just as stereo vision can be used to create a depth map of the environment, the optical flow between two subsequent frames can be used to create a depth map \cite{Jurriaans2011}.

The proposed framework uses a single thread to process the camera frames.
Therefore, the proposed SLAM method is unable to benefit from an increased image resolution.
This issue can be solved by processing multiple frames in parallel (using multiple video processing threads), to make better use of a multicore CPU.